Aki Shiroshita (Epidemiology PhD student, ) developed a tailored version of DeGAUSS specifically for the EV project.

About Original DeGAUSS

DeGAUSS (https://degauss.org/) is designed to derive environmental variables while preserving the privacy of protected health information (PHI). It uses Docker images to process address data, Users upload a CSV file containing address information and receive an output file with various environmental variables.

Limitations of Original DeGAUSS

Improvements in the Modified DeGAUSS

Modified DeGAUSS provides clean, processed output files with all PHI removed.

How to Use Modified DeGAUSS

The environment has already been set up for you. All you need to do is follow the instructions.

Step-by-Step Instructions

  1. Locate the Folder:

Navigate to the folder “C:_degauss_2025_08_14” on the Windows server (Cqshealth.dhcp.mc.vanderbilt.edu).

  1. Open R Project:

Launch R Studio.

Note: It may take 1–2 minutes to open, as the R Studio settings have been customized for this project. Please wait each time you run the program until items appear in the environment.

  1. Start Docker Desktop:

Open Docker Desktop for Windows.

  1. Run the Script:

Open the file test.R.

Execute the script section by section using the shortcut:

Place your cursor in the section and press Ctrl + Alt + T.

  1. Locate Output Files:

Processed data will be saved in any folder of your choice.

This folder contains CSV files, including: tract.csv (used for subject selection flow), final_data.csv (the final dataset for sharing with other researchers, with all PHI removed), tab_census.csv (census tract tabulation data), and tab_relocation.csv (relocation information).”

Specific instructions for Huiping

Note: Your data will remain on the shared drive and will never leave the VUMC environment.The server will load data into memory for processing, but data will not be stored in local server folders. Any temporary cache generated during processing will be automatically removed.

Could you provide the path to the input folder containing the address data and the file name?

What is the path to the output folder where you’d like to store the processed data after removing all PHI data?

If you would like to create a temporary folder in a different location to store intermediate files containing PHI, please specify the path.

Defining start date and end date

For defining start date and end data, we need merge any overlapping or adjacent enrollment periods into single, continuous time spans. This ensures there are no gaps in the timeline.

TennCare enrollment file is like this:

recip enrol_begin_date enrol_end_date address
1 2023-01-01 2024-01-02 123 Main St
1 2024-01-02 2025-03-02 456 Elm St
2 2022-01-02 2023-o1-02 789 Oak St

not like this:

recip registration_date address
1 2023-01-01 123 Main St
1 2024-01-02 456 Elm St
2 2022-01-02 789 Oak St

Delete modified DeGAUSS

Once all processes are completed and the required outputs are finalized, I will delete the modified DeGAUSS from the server.